| Objective | Complete |
|---|---|
| Use GridSearchCV to find the optimal number of nearest neighbors and define optimal model | |
| Discuss reasons we would or would not use kNN |
GridSearchCV is different from cross-validation because it performs an exhaustive search over a range of specified hyperparameter valuesGridSearchCVGridSearchCV function to find the optimal number of kk)# Define the parameter values that should be searched.
k_range = list(range(1, 31))
# Create a parameter grid: map the parameter names to the values that should be searched by building a Python dictionary.
# key: parameter name
# value: list of values that should be searched for that parameter
# single key-value pair for param_grid
param_grid = dict(n_neighbors = k_range)
print(param_grid)
# Instantiate the grid using our original model - kNN with k.{'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]}
# Create a pipeline of the scaler and gridsearch
grid_search_pipeline = Pipeline([('transformer', StandardScaler()), ('estimator', grid)])
# Fit Gridsearch pipeline
grid_search_pipeline.fit(X, y)Pipeline(steps=[('transformer', StandardScaler()),
('estimator',
GridSearchCV(cv=10, estimator=KNeighborsClassifier(),
param_grid={'n_neighbors': [1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18,
19, 20, 21, 22, 23,
24, 25, 26, 27, 28,
29, 30]},
scoring='accuracy'))])
[0.91643836 0.94637965 0.94168297 0.94990215 0.9481409 0.95009785
0.94951076 0.95088063 0.95029354 0.95107632 0.95068493 0.95088063
0.95068493 0.95107632 0.95107632 0.95107632 0.95107632 0.95127202
0.95127202 0.95127202 0.95127202 0.95127202 0.95127202 0.95127202
0.95127202 0.95127202 0.95127202 0.95127202 0.95127202 0.95127202]
# Create a list of the mean scores only by using a list comprehension to loop through grid.cv_results_.
grid_mean_scores = [result for result in grid.cv_results_['mean_test_score']]
print(grid_mean_scores)[0.9164383561643836, 0.9463796477495107, 0.941682974559687, 0.9499021526418787, 0.9481409001956946, 0.9500978473581212, 0.9495107632093933, 0.950880626223092, 0.9502935420743638, 0.9510763209393346, 0.9506849315068493, 0.950880626223092, 0.9506849315068493, 0.9510763209393346, 0.9510763209393346, 0.9510763209393346, 0.9510763209393346, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773, 0.9512720156555773]
Now that we have found our optimal k, let’s examine our results:
k0.9512720156555773
grid_score = grid.best_score_
# Dictionary containing the parameters (k) used to generate that score.
print(grid.best_params_)
# Actual model object fit with those best parameters.
# Shows default parameters that we did not specify.{'n_neighbors': 18}
KNeighborsClassifier(n_neighbors=18)
new_row = pd.DataFrame({'metrics' : ["accuracy"],
'values' : [round(grid_score, 4)],
'model': ['kNN_GridSearchCV']})
model_final = pd.concat([model_final, new_row], ignore_index=True)
print(model_final) metrics values model
0 accuracy 0.9419 kNN_k
1 accuracy 0.9513 kNN_GridSearchCV
kNN_best = grid.best_estimator_
# Check accuracy of our model on the test data.
print(kNN_best.score(X_test, y_test))0.9458577951728636
We are going to pause for a moment to learn about the library pickle
When we have objects we want to carry over and do not want to rerun code, we can pickle these objects
In other words, pickle will help us save objects from one script/ session and pull them up in new scripts
How do we do that? We use a function in Python called pickle
It is similar to flattening a file
# Save this final model
model_final = {'metrics' : "accuracy" ,
'values' : round(kNN_champ, 4),
'model':'kNN_optimized' }
print(model_final){'metrics': 'accuracy', 'values': 0.9459, 'model': 'kNN_optimized'}
| Objective | Complete |
|---|---|
| Use GridSearchCV to find the optimal number of nearest neighbors and define optimal model |
✔ |
| Discuss reasons we would or would not use kNN |
Pros
Cons
You are now ready to try Tasks 15-19 in the Exercise for this topic
| Objective | Complete |
|---|---|
| Use GridSearchCV to find the optimal number of nearest neighbors and define optimal model |
✔ |
| Discuss reasons we would or would not use kNN |
✔ |
In this part of the course, we have covered: